Colouring Summaries BLEU
نویسندگان
چکیده
In this paper we attempt to apply the IBM algorithm, BLEU, to the output of four different summarizers in order to perform an intrinsic evaluation of their output. The objective of this experiment is to explore whether a metric, originally developed for the evaluation of machine translation output, could be used for assessing another type of output reliably. Changing the type of text to be evaluated by BLEU into automatically generated extracts and setting the conditions and parameters of the evaluation experiment according to the idiosyncrasies of the task, we put the feasibility of porting BLEU in different Natural Language Processing research areas under test. Furthermore, some important conclusions relevant to the resources needed for evaluating summaries have come up as a side-effect of running the whole experiment.
منابع مشابه
Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics
Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram cooccurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct a...
متن کاملManual And Automatic Evaluation Of Summaries
In this paper we discuss manual and automatic evaluations of summaries using data from the Document Understanding Conference 2001 (DUC-2001). We first show the instability of the manual evaluation. Specifically, the low interhuman agreement indicates that more reference summaries are needed. To investigate the feasibility of automated summary evaluation based on the recent BLEU method from mach...
متن کاملVert: a Method for Automatic Evaluation of Video Summaries
Video Summarization has become an important tool for Multimedia Information processing, but the automatic evaluation of a video summarization system remains a challenge. A major issue is that an ideal “best” summary does not exist, although people can easily distinguish “good” from “bad” summaries. A similar situation arise in machine translation and text summarization, where specific automatic...
متن کاملHeadline Generation for Written and Broadcast News
This technical report is an overview of work done on Headline Generation for written and broadcast news. The report covers HMM Hedge, a statistical approach based on the noisy channel model, Hedge Trimmer, a parse-andtrim approach using linguistically motivated trimming rules, and Topiary, a combination of Trimmer and Unsupervised Topic Discovery. Automatic evaluation of summaries using ROUGE a...
متن کاملComparison of Some Automatic and Manual Methods for Summary Evaluation Based on the Text Summarization Challenge 2
In this paper, we compare some automatic and manual methods for summary evaluation. One of the essential points for evaluating a summary is how well the evaluation measure recognizes slight differences in the quality of the computer-produced summaries. In terms of this point, we examined ‘evaluation by revision’ using the data of the Text Summarization Challenge 2 (TSC2). Evaluation by revision...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003